1,124,549 research outputs found
Optimistic Concurrency Control for Distributed Unsupervised Learning
Research on distributed machine learning algorithms has focused primarily on
one of two extremes - algorithms that obey strict concurrency constraints or
algorithms that obey few or no such constraints. We consider an intermediate
alternative in which algorithms optimistically assume that conflicts are
unlikely and if conflicts do arise a conflict-resolution protocol is invoked.
We view this "optimistic concurrency control" paradigm as particularly
appropriate for large-scale machine learning algorithms, particularly in the
unsupervised setting. We demonstrate our approach in three problem areas:
clustering, feature learning and online facility location. We evaluate our
methods via large-scale experiments in a cluster computing environment.Comment: 25 pages, 5 figure
Dynamic Control Flow in Large-Scale Machine Learning
Many recent machine learning models rely on fine-grained dynamic control flow
for training and inference. In particular, models based on recurrent neural
networks and on reinforcement learning depend on recurrence relations,
data-dependent conditional execution, and other features that call for dynamic
control flow. These applications benefit from the ability to make rapid
control-flow decisions across a set of computing devices in a distributed
system. For performance, scalability, and expressiveness, a machine learning
system must support dynamic control flow in distributed and heterogeneous
environments.
This paper presents a programming model for distributed machine learning that
supports dynamic control flow. We describe the design of the programming model,
and its implementation in TensorFlow, a distributed machine learning system.
Our approach extends the use of dataflow graphs to represent machine learning
models, offering several distinctive features. First, the branches of
conditionals and bodies of loops can be partitioned across many machines to run
on a set of heterogeneous devices, including CPUs, GPUs, and custom ASICs.
Second, programs written in our model support automatic differentiation and
distributed gradient computations, which are necessary for training machine
learning models that use control flow. Third, our choice of non-strict
semantics enables multiple loop iterations to execute in parallel across
machines, and to overlap compute and I/O operations.
We have done our work in the context of TensorFlow, and it has been used
extensively in research and production. We evaluate it using several real-world
applications, and demonstrate its performance and scalability.Comment: Appeared in EuroSys 2018. 14 pages, 16 figure
A Unified Model For Developmental Robotics
We present the architecture and distributed
algorithms of an implemented system called
NeuSter, that unifies learning, perception and action
for autonomous robot control. NeuSter comprises
several sub-systems that provide online
learning for networks of million neurons on machine
clusters. It extracts information from sensors,
builds its own representations of the environment
in order to learn non-predefined goals
- …